Search CORE

560 research outputs found

DC-Prophet: Predicting Catastrophic Machine Failures in DataCenters

Author: B Scholkopf
C Bishop
C Rijsbergen van
D-C Juan
DM Powers
GE Box
L Breiman
LA Barroso
Publication venue
Publication date: 14/08/2017
Field of study

When will a server fail catastrophically in an industrial datacenter? Is it possible to forecast these failures so preventive actions can be taken to increase the reliability of a datacenter? To answer these questions, we have studied what are probably the largest, publicly available datacenter traces, containing more than 104 million events from 12,500 machines. Among these samples, we observe and categorize three types of machine failures, all of which are catastrophic and may lead to information loss, or even worse, reliability degradation of a datacenter. We further propose a two-stage framework-DC-Prophet-based on One-Class Support Vector Machine and Random Forest. DC-Prophet extracts surprising patterns and accurately predicts the next failure of a machine. Experimental results show that DC-Prophet achieves an AUC of 0.93 in predicting the next machine failure, and a F3-score of 0.88 (out of 1). On average, DC-Prophet outperforms other classical machine learning methods by 39.45% in F3-score.Comment: 13 pages, 5 figures, accepted by 2017 ECML PKD

arXiv.org e-Print Archive

Crossref

Forecasting Player Behavioral Data and Simulating in-Game Events

Author: A Natekin
AJ Fox
C Bauckhage
Colin Chen
DH Ackley
G Ridgeway
G Schwarz
G Zhang
GE Box
GE Hinton
H Akaike
JG Cragg
JG Gooijer De
JH Friedman
KD Lawrence
L Deng
L Dwyer
M Gilliland
M Längkvist
MS El-Nasr
N Srivastava
NE Breslow
PH Eilers
PJ Brockwell
RJ Hyndman
S Asmussen
S Hochreiter
S Makridakis
SN Wood
SN Wood
SN Wood
T Hastie
T Zhang
TJ Hastie
Y Bengio
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 05/10/2017
Field of study

Understanding player behavior is fundamental in game data science. Video games evolve as players interact with the game, so being able to foresee player experience would help to ensure a successful game development. In particular, game developers need to evaluate beforehand the impact of in-game events. Simulation optimization of these events is crucial to increase player engagement and maximize monetization. We present an experimental analysis of several methods to forecast game-related variables, with two main aims: to obtain accurate predictions of in-app purchases and playtime in an operational production environment, and to perform simulations of in-game events in order to maximize sales and playtime. Our ultimate purpose is to take a step towards the data-driven development of games. The results suggest that, even though the performance of traditional approaches such as ARIMA is still better, the outcomes of state-of-the-art techniques like deep learning are promising. Deep learning comes up as a well-suited general model that could be used to forecast a variety of time series with different dynamic behaviors

arXiv.org e-Print Archive

Crossref

Population mortality during the outbreak of Severe Acute Respiratory Syndrome in Toronto

Author: A Agarwal
Angela M Cheung
CH Lee
Chaim M Bell
CM Booth
Collins D Selwyn
D Meyers
DW Alling
E Gurfinkel
G Galloway
G Woodward
GE Box
GE Box
HJ Chang
I Bukovsky
J Housworth
JP Burke
JY Lo
K Boutis
L Jackson
M Ichikawa
M Naghavi
MJ Schull
National Advisory Committee on SARS and Public Health
O Marcovici
PE Slater
Rahim Moineddin
RS Pyndyck
SM Poutanen
Stephen W Hwang
T Boyle
T Svoboda
T Svoboda
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Extraordinary infection control measures limited access to medical care in the Greater Toronto Area during the 2003 Severe Acute Respiratory Syndrome (SARS) outbreak. The objective of this study was to determine if the period of these infection control measures was associated with changes in overall population mortality due to causes other than SARS. Methods Observational study of death registry data, using Poisson regression and interrupted time-series analysis to examine all-cause mortality rates (excluding deaths due to SARS) before, during, and after the SARS outbreak. The population of Ontario was grouped into the Greater Toronto Area (N = 2.9 million) and the rest of Ontario (N = 9.3 million) based upon the level of restrictions on delivery of clinical services during the SARS outbreak. Results There was no significant change in mortality in the Greater Toronto Area before, during, and after the period of the SARS outbreak in 2003 compared to the corresponding time periods in 2002 and 2001. The rate ratio for all-cause mortality during the SARS outbreak was 0.99 [95% Confidence Interval (CI) 0.93–1.06] compared to 2002 and 0.96 [95% CI 0.90–1.03] compared to 2001. An interrupted time series analysis found no significant change in mortality rates in the Greater Toronto Area associated with the period of the SARS outbreak. Conclusion Limitations on access to medical services during the 2003 SARS outbreak in Toronto had no observable impact on short-term population mortality. Effects on morbidity and long-term mortality were not assessed. Efforts to contain future infectious disease outbreaks due to influenza or other agents must consider effects on access to essential health care services.</p

University of Toronto Research Repository

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Modelling informative time points: an evolutionary process approach

Author: A Monteiro
AG Hawkes
D Ryu
DJ Daley
DJ Daley
F Lindgren
GE Box
JG Rasmussen
JW Hogan
K Kristensen
MJ Daniels
P Diggle
PAW Lewis
PJ Brockwell
RH Shumway
Y Liang
Y Ogata
Z Wang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2021
Field of study

Real time series sometimes exhibit various types of "irregularities": missing observations, observations collected not regularly over time for practical reasons, observation times driven by the series itself, or outlying observations. However, the vast majority of methods of time series analysis are designed for regular time series only. A particular case of irregularly spaced time series is that in which the sampling procedure over time depends also on the observed values. In such situations, there is stochastic dependence between the process being modelled and the times of the observations. In this work, we propose a model in which the sampling design depends on all past history of the observed processes. Taking into account the natural temporal order underlying available data represented by a time series, then a modelling approach based on evolutionary processes seems a natural choice. We consider maximum likelihood estimation of the model parameters. Numerical studies with simulated and real data sets are performed to illustrate the benefits of this model-based approach.- The authors acknowledge Foundation FCT (FundacAo para a Ciencia e Tecnologia) as members of the research project PTDC/MAT-STA/28243/2017 and Center for Research & Development in Mathematics and Applications of Aveiro University within project UID/MAT/04106/2019

Universidade do Minho: RepositoriUM

Crossref

Repositório Institucional da Universidade de Aveiro

RNA Unwinding by NS3 Helicase: A Statistical Approach

Author: B Brodsky
C Bustamante
C Danilowicz
D Cox
D Cox
DR Brillinger
E Jankowsky
GE Box
Grzegorz Kudla
J Ali
J Chen
J Weeks
RO Duda
S Arunajadai
S Dumont
S Myong
Srikesh G. Arunajadai
V Serebrov
Publication venue: Public Library of Science
Publication date: 01/01/2009
Field of study

The study of double-stranded RNA unwinding by helicases is a problem of basic scientific interest. One such example is provided by studies on the hepatitis C virus (HCV) NS3 helicase using single molecule mechanical experiments. HCV currently infects nearly 3% of the world population and NS3 is a protein essential for viral genome replication. The objective of this study is to model the RNA unwinding mechanism based on previously published data and study its characteristics and their dependence on force, ATP and NS3 protein concentration. In this work, RNA unwinding by NS3 helicase is hypothesized to occur in a series of discrete steps and the steps themselves occurring in accordance with an underlying point process. A point process driven change point model is employed to model the RNA unwinding mechanism. The results are in large agreement with findings in previous studies. A gamma distribution based renewal process was found to model well the point process that drives the unwinding mechanism. The analysis suggests that the periods of constant extension observed during NS3 activity can indeed be classified into pauses and subpauses and that each depend on the ATP concentration. The step size is independent of external factors and seems to have a median value of 11.37 base pairs. The steps themselves are composed of a number of substeps with an average of about 4 substeps per step and an average substep size of about 3.7 base pairs. An interesting finding pertains to the stepping velocity. Our analysis indicates that stepping velocity may be of two kinds- a low and a high velocity

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Computer Aided Inspection: design of customer oriented benchmark for non contact 3D scanners evaluation

Author: B Almannai
C Lartigue
CK Kwong
Enrico Vezzetti
F Prieto
GE Box
GL Urban
H Kerzner
HT Yau
J Clark
J Clark
J Paakkari
K Creehan
M Kogure
M Rioux
M Sarfraz
PJ Besl
S Lockett
TC Strand
Y Wu
Publication venue
Publication date: 01/01/2008
Field of study

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Beyond Volume: The Impact of Complex Healthcare Data on the Machine Learning Pipeline

Author: A Arcuri
AL Rector
AM Wood
AS Glas
B Kulis
C Cortes
C Sammut
CC Diamond
CD Kidd
CR MacIntyre
DP Lewis
E Koumoundouros
E Rahm
EM Knorr
ES Fisher
GE Box
GM Weber
H Carter
H He
H Meyer
H Quan
HH Hoos
I Yoo
J Andreu-Perez
J Fan
J Zhao
JD Lafferty
JM Bland
JW Graham
K Lange
KP Murphy
LA King
LM Collins
M Azarm-Daigle
M Kantardzic
M Sokolova
MA Stoto
N Oreskes
PB Jensen
PK Lindenauer
PM Visscher
RJ Little
V López
V Sessions
VN Vapnik
W Raghupathi
Y Luo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 26/01/2018
Field of study

From medical charts to national census, healthcare has traditionally operated under a paper-based paradigm. However, the past decade has marked a long and arduous transformation bringing healthcare into the digital age. Ranging from electronic health records, to digitized imaging and laboratory reports, to public health datasets, today, healthcare now generates an incredible amount of digital information. Such a wealth of data presents an exciting opportunity for integrated machine learning solutions to address problems across multiple facets of healthcare practice and administration. Unfortunately, the ability to derive accurate and informative insights requires more than the ability to execute machine learning models. Rather, a deeper understanding of the data on which the models are run is imperative for their success. While a significant effort has been undertaken to develop models able to process the volume of data obtained during the analysis of millions of digitalized patient records, it is important to remember that volume represents only one aspect of the data. In fact, drawing on data from an increasingly diverse set of sources, healthcare data presents an incredibly complex set of attributes that must be accounted for throughout the machine learning pipeline. This chapter focuses on highlighting such challenges, and is broken down into three distinct components, each representing a phase of the pipeline. We begin with attributes of the data accounted for during preprocessing, then move to considerations during model building, and end with challenges to the interpretation of model output. For each component, we present a discussion around data as it relates to the healthcare domain and offer insight into the challenges each may impose on the efficiency of machine learning techniques.Comment: Healthcare Informatics, Machine Learning, Knowledge Discovery: 20 Pages, 1 Figur

arXiv.org e-Print Archive

Crossref

Distributed Fine-Grained Traffic Speed Prediction for Large-Scale Transportation Networks based on Automatic LSTM Customization and Sharing

Author: B Bustillos
B Jiang
B Williams
BM Williams
GA Davis
GE Box
JA Nelder
JC Lin
JWC van Lint
KY Chan
MC Lee
MS Ahmed
S Hochreiter
S Hochreiter
S Lee
S Singer
TV Le
X Ma
X Ma
Z Zhao
Publication venue
Publication date: 01/01/2020
Field of study

Short-term traffic speed prediction has been an important research topic in the past decade, and many approaches have been introduced. However, providing fine-grained, accurate, and efficient traffic-speed prediction for large-scale transportation networks where numerous traffic detectors are deployed has not been well studied. In this paper, we propose DistPre, which is a distributed fine-grained traffic speed prediction scheme for large-scale transportation networks. To achieve fine-grained and accurate traffic-speed prediction, DistPre customizes a Long Short-Term Memory (LSTM) model with an appropriate hyperparameter configuration for a detector. To make such customization process efficient and applicable for large-scale transportation networks, DistPre conducts LSTM customization on a cluster of computation nodes and allows any trained LSTM model to be shared between different detectors. If a detector observes a similar traffic pattern to another one, DistPre directly shares the existing LSTM model between the two detectors rather than customizing an LSTM model per detector. Experiments based on traffic data collected from freeway I5-N in California are conducted to evaluate the performance of DistPre. The results demonstrate that DistPre provides time-efficient LSTM customization and accurate fine-grained traffic-speed prediction for large-scale transportation networks.Comment: 14 pages, 7 figures, 2 tables, Euro-par 2020 conferenc

arXiv.org e-Print Archive

Crossref

NORA - Norwegian Open Research Archives

Observed Reductions in Schistosoma mansoni Transmission from Large-Scale Administration of Praziquantel in Uganda: A Mathematical Modelling Study

Author: A Fenwick
A Garba
A Koukounari
A Koukounari
A Montresor
A Olsen
AE Butterworth
AE Butterworth
AE Butterworth
AE Butterworth
AJ Fulford
AJ Fulford
AJ Fulford
Alan Fenwick
Alison P. Galvani
B Bolker
BG Williams
BR Kirkwood
CC Appleton
CM Davies
CM Gower
D Mupfasoni
DW Crompton
DW Dunne
E Miguel
F Mutapi
F Mutapi
GE Box
GF Medley
J Richter
JC Sousa-Figueiredo
Joanne P. Webster
JP Webster
JR Stothard
KM Bosompem
L Yaméogo
M Roberts
Manoj Gambhir
Maria-Gloria Basáñez
ME Woolhouse
ME Woolhouse
ME Woolhouse
MF Useh
MG Basáñez
MH Hussein
Michael D. French
MS Chan
MS Chan
MS Chan
MS Chan
MS Chan
MS Chan
Narcis B. Kabatereine
NB Kabatereine
NB Kabatereine
NB Kabatereine
NB Kabatereine
P Good
P Hagan
P Jordan
P Steinmann
RC Collins
RF Sturrock
RF Sturrock
RF Sturrock
RM Anderson
SE Odogwu
SI Mekheimar
SM Blower
Thomas S. Churcher
TS Churcher
V Kumar
Y Zhang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/11/2010
Field of study

To date schistosomiasis control programmes based on chemotherapy have largely aimed at controlling morbidity in treated individuals rather than at suppressing transmission. In this study, a mathematical modelling approach was used to estimate reductions in the rate of Schistosoma mansoni reinfection following annual mass drug administration (MDA) with praziquantel in Uganda over four years (2003-2006). In doing this we aim to elucidate the benefits of MDA in reducing community transmission.Age-structured models were fitted to a longitudinal cohort followed up across successive rounds of annual treatment for four years (Baseline: 2003, TREATMENT: 2004-2006; n = 1,764). Instead of modelling contamination, infection and immunity processes separately, these functions were combined in order to estimate a composite force of infection (FOI), i.e., the rate of parasite acquisition by hosts.MDA achieved substantial and statistically significant reductions in the FOI following one round of treatment in areas of low baseline infection intensity, and following two rounds in areas with high and medium intensities. In all areas, the FOI remained suppressed following a third round of treatment.This study represents one of the first attempts to monitor reductions in the FOI within a large-scale MDA schistosomiasis morbidity control programme in sub-Saharan Africa. The results indicate that the Schistosomiasis Control Initiative, as a model for other MDA programmes, is likely exerting a significant ancillary impact on reducing transmission within the community, and may provide health benefits to those who do not receive treatment. The results obtained will have implications for evaluating the cost-effectiveness of schistosomiasis control programmes and the design of monitoring and evaluation approaches in general

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Spiral - Imperial College Digital Repository

Time series modeling for syndromic surveillance

Author: A Goldenberg
Ben Y Reis
EN Barthell
FC Tsui
From the Centers for Disease Control and Prevention
GE Bowdish
GEP Box
H Batal
IS Kohane
JA Hagen
JU Espino
Kenneth D Mandl
LG Kun
M Lewis
M Shannon
MM Wagner
MS Green
R Lazarus
SE Harcourt
SS Morse
WB Lober
Publication venue: BioMed Central
Publication date: 01/01/2003
Field of study

BACKGROUND: Emergency department (ED) based syndromic surveillance systems identify abnormally high visit rates that may be an early signal of a bioterrorist attack. For example, an anthrax outbreak might first be detectable as an unusual increase in the number of patients reporting to the ED with respiratory symptoms. Reliably identifying these abnormal visit patterns requires a good understanding of the normal patterns of healthcare usage. Unfortunately, systematic methods for determining the expected number of (ED) visits on a particular day have not yet been well established. We present here a generalized methodology for developing models of expected ED visit rates. METHODS: Using time-series methods, we developed robust models of ED utilization for the purpose of defining expected visit rates. The models were based on nearly a decade of historical data at a major metropolitan academic, tertiary care pediatric emergency department. The historical data were fit using trimmed-mean seasonal models, and additional models were fit with autoregressive integrated moving average (ARIMA) residuals to account for recent trends in the data. The detection capabilities of the model were tested with simulated outbreaks. RESULTS: Models were built both for overall visits and for respiratory-related visits, classified according to the chief complaint recorded at the beginning of each visit. The mean absolute percentage error of the ARIMA models was 9.37% for overall visits and 27.54% for respiratory visits. A simple detection system based on the ARIMA model of overall visits was able to detect 7-day-long simulated outbreaks of 30 visits per day with 100% sensitivity and 97% specificity. Sensitivity decreased with outbreak size, dropping to 94% for outbreaks of 20 visits per day, and 57% for 10 visits per day, all while maintaining a 97% benchmark specificity. CONCLUSIONS: Time series methods applied to historical ED utilization data are an important tool for syndromic surveillance. Accurate forecasting of emergency department total utilization as well as the rates of particular syndromes is possible. The multiple models in the system account for both long-term and recent trends, and an integrated alarms strategy combining these two perspectives may provide a more complete picture to public health authorities. The systematic methodology described here can be generalized to other healthcare settings to develop automated surveillance systems capable of detecting anomalies in disease patterns and healthcare utilization

Crossref

Harvard University - DASH

Springer

Directory of Open Access Journals

PubMed Central